Large-Scale Comparative Annotation of Bacterial Genomes
نویسندگان
چکیده
The MSc thesis describes the design and evaluation of a comparative annotation system for prokaryotic genomes. The proposed system makes use of a combination of comparative and abinitio gene finding techniques for accurate annotation. The three main modules described in the system are the markov-model based gene finding module, the alignment based profiler and the comparative module that combines the results from both modules. The system works by first creating a multiple sequence alignment of the input sequences and doing independent gene finding through a combination of markov models of order one to five. These two operations are carried out in parallel and there are no interdependencies. The multiple sequence alignment is then used to create a profile that emphasizes regions of high conservation within the sequences. The results from the markov model module are then processed through the profile to give a final comparative annotation. The system was evaluated by carrying out a series of tests with an increasing number of sequences in each test set. The results obtained were quite encouraging, in that the gene of interest was found 100% of the time, with the false positives being removed very accurately due to the comparative technique. Moreover the results indicated as expected, that increasing the number of sequences in the test sets has a very positive impact on resulting predictions and confidence scores.
منابع مشابه
Restauro-G: A Rapid Genome Re-Annotation System for Comparative Genomics
Annotations of complete genome sequences submitted directly from sequencing projects are diverse in terms of annotation strategies and update frequencies. These inconsistencies make comparative studies difficult. To allow rapid data preparation of a large number of complete genomes, automation and speed are important for genome re-annotation. Here we introduce an open-source rapid genome re-ann...
متن کاملAnnotation, comparison and databases for hundreds of bacterial genomes.
The multitude of bacterial genome sequences being determined has opened up a new field of research, that of comparative genomics. One role of bioinformatics is to assist biologists in the extraction of biological knowledge from this data flood. Software designed for the analysis and functional annotation of a single genome have, in consequence, evolved towards comparative genomics tools, bringi...
متن کاملEnsembl Genomes (non-chordates): Quick tour
Ensembl Bacteria [3], Protists [4], Fungi [5], Plants [6] and Metazoa [7] (collectively, ‘Ensembl Genomes’) are five portals for genome-scale data, developed in close collaboration with scientific communities expert in the biology of individual species. Implemented using the Ensembl software suite for genome analysis and browsing, which was developed for the study of vertebrate genomes (describ...
متن کاملMicroScope: a platform for microbial genome annotation and comparative genomics
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific ...
متن کاملBacterial genomes as new gene homes: the genealogy of ORFans in E. coli.
Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uniqueness of ORFans within a genome has precluded the use of a comparative appro...
متن کامل